Fruit

As with the previous PyTorch Cartpole example, the agent is learning "from vision" to translate the raw pixel array into actions using DQN. In the 2D Graphical example, the agent appears at random locations and must find the "fruit" object to gain the reward and win episodes before running out of bounds or the timeout period expires.The agent has 5 possible actions to choose from: up, down, left, right, or none on the screen in order to navigate to the object.

`fruit` Implementation

The fruit code looks very similar to the catch code. It’s important to note that the same agent class is used in both environments!

    // Create reinforcement learner agent in pyTorch
    dqnAgent* agent = dqnAgent::Create(gameWidth, gameHeight, 
                       NUM_CHANNELS, NUM_ACTIONS, OPTIMIZER, 
                       LEARNING_RATE, REPLAY_MEMORY, BATCH_SIZE, 
                       GAMMA, EPS_START, EPS_END, EPS_DECAY,
                       USE_LSTM, LSTM_SIZE, ALLOW_RANDOM, DEBUG_DQN);

The parameter values are slightly different (the frame size and number of channels have changed), but the algorithm for training the network to produce actions from inputs remains the same.

The environment is more complicated for fruit than it is for catch, so it has been extracted to the fruitEnv.cpp module and it’s own class, FruitEnv. The environment object named fruit is instantiated in the fruit.cpp module.

    // Create Fruit environment
    FruitEnv* fruit = FruitEnv::Create(gameWidth, gameHeight, epMaxFrames);

We can trace the handoff between the agent and environment through the following code snippet located in the main game loop in the fruit.cpp module:

        // Ask the agent for their action
        int action = 0;
        if( !agent->NextAction(input_tensor, &action) )
            printf("[deepRL]  agent->NextAction() failed.\n");

        if( action < 0 || action >= NUM_ACTIONS )
            action = ACTION_NONE;

        // Provide the agent's action to the environment
        const bool end_episode = fruit->Action((AgentAction)action, &reward);

In this snippet, action is the variable that contains the agent object’s next action, based on the previous environment state represented by the input_tensor variable. The reward is determined in the last line when the action is submitted to the environment object named fruit.

Quiz - Fruit Rewards

The fruit rewards function can be implemented a number of different ways. Below are several possible reward functions for the game that compare previous and current distances between the agent and its goal. Match each to descriptions in the quiz below.

A

*reward = (lastDistanceSq > fruitDistSq) ? 1.0f : 0.0f;

B

*reward = (sqrtf(lastDistanceSq) - sqrtf(fruitDistSq)) * 0.5f;

C

*reward = (sqrtf(lastDistanceSq) - sqrtf(fruitDistSq)) * 0.33f;

D

*reward = exp(-(fruitDistSq/worldWidth/1.5f));

QUIZ QUESTION::

Match each description to the reward functions shown above.

ANSWER CHOICES:

Description	Code snippet label
B
D
A
C

SOLUTION:

Description	Code snippet label
B
D
A
C

Running Fruit

To test the fruit sample, open the desktop in the “Test the API” workspace, open a terminal, and once again navigate to the build directory with

cd /home/workspace/RoboND-DeepRL-Project/build

Launch the following executable from the terminal in the build directory:

$ cd x86_64/bin
$ ./fruit

It should achieve 85% accuracy after around ~100 episodes within the default 48x48 environment:

Alternate Arguments

Optional command line parameter examples for fruit can be used to change the size of the pixel array and limit the number of frames:

$ ./fruit --width=64 --height=64 --episode_max_frames=100

Fruit

fruit Implementation

Quiz - Fruit Rewards

A

B

C

D

Running Fruit

Alternate Arguments

`fruit` Implementation